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Abstract 

We prove an elementary yet useful inequality bounding the maximal value of certain linear 
programs. This leads directly to a bound on the martingale difference for arbitrarily dependent 
random variables, providing a generalization of some recent concentration of measure results. 
The linear programming inequality may be of independent interest. 

1 Introduction 

1.1 Background 

Over the past decade there has been a flurry of new concentration of measure inequalities; we 
refer the reader to 0] for an in-depth survey, or |2HH1E1 f° r some more recent advances. 

In the martingale difference method was employed in a novel way to obtain a general con- 
centration inequality for dependent random variables, with respect to the (unweighted) Hamming 
metric. At the core of that approach lies a certain linear programming inequality associated with 
bounding martingale differences Theorem 4.8]. In this paper, we give a considerably simpler 
proof of a rather more general result, extending it to the weighted Hamming metrics. The ap- 
plications to measure concentration arc immediate (culminating in Corollary 13.31) : additionally, 
it is hoped that the linear programming inequality and the technique employed for proving it 
will find further applications. 

Since the main focus of this paper is the inequality in Theorem 12.51 we forgo a detailed 
discussion of measure concentration and how our bound relates to existing results. Such a 
discussion may be found in [21 Ej • 



1.2 Notational conventions 

Throughout this paper, S will denote a finite set. Random variables are capitalized (X), specified 
sequences (words) are written in lowercase (x € S n ), the shorthand Xf = (X i} . . . , Xj) is used for 
all sequences, and word concatenation is denoted using the multiplicative notation: = x^ . 

Similarly, if w e K" and 1 < k < £ < n, then w{ = (w k , ■ ■ ■ , w t ) £ R fe ^ +1 . 

We use the indicator variable to assign 0-1 truth values to the predicate in {•}. The 
ramp function is defined by (z) + = zl{ 2>0 }. The positive reals arc denoted by M + = (0, oo). 
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The probability P and expectation E operators are denned with respect the measure space 
specified in context. 



2 Linear programming inequality 

We begin with a natural generalization of some of the definitions in 2. . Fix a finite set S, n € N 
and w € R" . Then 

1. K n denotes the set of all functions k : S n — > R (and Kq = M) 

2. the weighted Hamming metric on S n x S n is defined by 

n 

d w (x,y) = ^2w z l {x ^ yt} (1) 
»=i 

3. for <p G if„, its Lipschitz constant with respect to c?^, denoted by ||y>|| Li w , is defined to be 
the smallest c for which 

\ip(x) - ip(y)\ < cd w (x,y) 

for all x,y 6 5"; any </? with ||¥>|| Lip . U j < c is called c-Lipschitz 

4. for u e [0, oo), define C K n to be the set of all ip such that [[y|| Llp w < 1 and 

n 

< p(:c) < u + ^Wi, a; £ 5"; 
i=l 

we omit the +v superscript when v = 0, writing simply ^u,^ 

5. the marginal projection operator (•)' takes ft £ K n to «' 6 K n -\ by 

for n = 1, k' is the scalar k' = X^es K ( x i) 

6. for y e 5, the y-section operator takes k S -fC„ to K y G -K" n -i by 

for n = 1, /%(•) is the scalar 

7. the functional ^^.n : if n — > R is defined by 'Stu.oO = and 

*«,,n(«) = wi^] (k(i)) + + ^ 2 »,„-i(k'); (2) 

when Wi = 1 we omit it from the subscript, writing simply \l/ n 

8. the finite-dimensional vector space K n is equipped with the inner product 

(k, A) = E] ^(^)A(x) 

9. two norms are defined on k 6 -f4T„: the $ M -norm, 



<I>. 



and the "J^-norm, 



sup \{K,<p)\ (3) 



max\EV n (sK). (4) 

s— ±1 
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Remark 2.1. For the special case Wi = 1, d w is the unweighted Hamming metric used in [21 - It is 
straightforward to verify that $ m -norm and f m -norm satisfy the vector-space norm axioms for 
any w £ K™ ; this is done in [2] for id, = 1. Since we will not be appealing to any norm properties 
of these functionals, we omit the proof. Note that for any y £ S, the marginal projection and 
y-section operators commute; in other words, for k £ K n+ 2, we have (n') y = (k v )' £ K n and so 
we can denote this common value by n' y £ K n : 



K y (z) = K y {x\z) — n{x\zy), z £ S" . 



xi es 



The main result of this section is 



n es 



Theorem 2.2. For all w £ E™ and all k £ K n , we have 



Nl 



< 



(5) 



Remark 2.3. We refer to (JSJ - more properly, to @, from which the former immediately follows 
- as a linear programming inequality for the reason that F(-) = (k, •} is a linear function being 
maximized over the finitely generated, compact, convex polytope C M. s . We make no 

use of this simple fact and therefore forgo its proof, but see 2, Lemma 4.4] for a proof of a 
closely related claim. The term "linear programming" is a bit of a red herring since no actual 
LP techniques are being used; for lack of an obvious natural name, we have alternatively referred 
to J5j in previous papers and talks as the "<I>-norm bound" or the inequality." 

The key technical lemma is a decomposition of $F w , n (') in terms of y-sections, proved in 
for the case Wj = 1: 



Lemma 2.4. For all n > 1, w £ R+ and k £ K n , we have 



^ W A K ) = E 

yes 



^w?-\n-l( K v)+ m 



J K v( x )) 



(6) 



Proof. We proceed by induction on n. To prove the n — 1 case, recall that 5° is the set containing 
a single (null) word and that for n £ K%, n y £ K is the scalar n(y). Thus, by definition of 
^10,1 (•) j we have 

yes 

which proves © for n = 1. 

Suppose the claim holds for some n = £ > 1. Pick any w £ R^ 1 and k £ Kt + \ and examine 



E 

ydS 



^w[A K y)+ w i+i E 



E 

yes 

E 

yes 



+■ 



xes e J \x£S e 




Wl 



E (*(*))- 



(7) 
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where the first equality follows from the definition of $! w t t in J2J) and the second one from the 
easy identities 

EEw^ = E 

and 

On the other hand, by definition we have 

= Wl Yl ( K ( z V+ + *wi+ i A K ')- ( 8 ) 

To compare the r.h.s. of with the r.h.s. of (JSJ, note that the u>i ^2 ze gi+i (k(z)) + term is 
common to both and 



E 

yes 



^.nKi + ^+i E = *t «+ i ,i(« / ) 

by the inductive hypothesis. This establishes iJBJ for n — £ + 1 and proves the claim. □ 

Our main result, Theorem 12. 21 is an immediate consequence of 
Theorem 2.5. For all n > 1, w € R" , t> € [0, oo) and k G if n , we have 



sup (k. r -) *„•.»('.•) ■+ <• j V] /i(x)J . (9) 



Proof. We will prove the claim by induction on n. For n = 1, pick any w% G K+, u 6 [0, oo) and 
k G ifi. Since by construction any p G <$>^ v 1 is wi-Lipschitz with respect to the discrete metric 
on iS, p must be of the form 

cp(x) = ip(x) +v, x G S, 

where <p : S — > [0, w\\ and < v < v (in fact, we have the explicit value v = (max xg s — toi) , ) 
Therefore, 

(K,<p) = (K,<p) +vY2 K i x )- ( 10 ) 
xes 

The first term in the r.h.s. of (jlOJI is clearly maximized when (p[x) — wJil{ K (a:)>o} for all x G iS, 
which shows that it is bounded by \E , m , 1i i(k). Since the second term in the r.h.s. of 1)10(1 is 
bounded by v (X) x eS + j we have established (© for n = 1. 

Now suppose the claim holds for n = £, and pick any «; G IR^ 1 , v G [0, oo) and k G if&fi- By 
the reasoning given above (i.e., using the fact that < ip < v+y).^} Wi and that <p is 1-Lipschitz 
with respect to d w ), any ip G must be of the form p = ip + v, where p G & w ^+i and 

< v < v. Thus we write (n, <p) = (k, p) +v J2x^s e + 1 K ( x ) an< ^ decompose 

= J2( K y^y}> ( n ) 

yes 
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making the obvious but crucial observation that 

(p E ® w j+i =>■ p s £$ 
Then it follows by the inductive hypothesis that 



+WI+1 



Applying Lemma \l. 41 to H12|l . we have 



vies' 



yes yes 



yxes 1 



This, combined with l|ll|) and the trivial bound 



(12) 



(13) 



v k(x) < v k(x) 



proves the claim for n = I + 1 and hence for all n. □ 

3 Applications to concentration of measure 

This section assumes some familiarity with the notion of measure concentration; see the Refer- 
ences section (in particular, 4,5) for introductory and survey material. Briefly, we shall concern 
ourselves with the metric probability space (S n , d w ,P) where S is a finite set, w 6 K", d w is the 
weighted Hamming metric defined in Q and P is a (possibly non-product) probability measure 
on S n . For random variables / : S n — > R, our goal is to bound P{| / — E/| > t}. 

The method of martingale differences has been used to prove concentration of measure results 
since the work of Hoeffding, Azuma, and McDiarmid; see the exposition and references in Ej ■ 
Let (S n , d w ,P) be as defined above and associate to it the (canonical) random process (Ai)i<i<„, 
Xi 6 S, satisfying 



for any A C S n . 

For 1 < i < n, f : S 7 ' 



P{A e A} = P(A) 



and y\ 6 <S 8 , define the martingale difference 



Vi(f-M) 



Let 



and 



nf(X)\Xl=yl]--E[f(X)\Xl 



Vi{f) = max|VK/;yj)| 

yiesi 



(14) 



(15) 



D 2 v) = £^ 2 (/). 

Then Azuma's inequality states that 

P{|/-E/|><} < 2exp(-t 2 /2^ 2 (/)) 



(16) 
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(see 0] for a modern presentation and a short proof of JTBj) ) . 

In |2] and a technique was developed for bounding the martingale difference Vi(f',y) in 
terms of the Lipschitz constant of / and mixing properties of the measure P. To this end, we 
introduce the so-called 77-mixing coefficients (see discussion ibid, regarding the appearance of 
these coefficients in earlier work of Marton [H] and Samson jj]). 

For 1 < i < j < n and x £ S l , let 

C{X-\X[=x) 

be the law (distribution) of conditioned on X\ = x. For y £ S 1 ^ 1 and z, z' £ S, define 

mj {y,z,z') = \\C(X?\X i 1 =vz)-C(X?\X i 1 =yz')\\ TV , (17) 

where ||-|| TV is the total variation norm, defined here, for a signed measure r on a finite space 
Xby 

imItv = IE 1^)1- 

Additionally, define 

fjii = max max r)ij(y, z, z'). 

yg5 l -i z,z'£S 

The main application of Theorem l2.5l to measure concentration is the following bound on the 
martingale difference: 

Theorem 3.1. Let S be a finite set, and let (Xi)i<i< n , Xj £ S be the random process associated 
with the measure P on S n . Let A„ be the upper-triangular n x n matrix defined by (A n )a = 1 
and 

(A n )ij = fj i:j (18) 
for 1 < i < j i < n. Then, for all w £ M™ and f : S n — > R, we Ziaue 

n 

< \\f\\l lp , w \\*nwf 2 (19) 

where V?(f) is defined in | 

Remark 3.2. Since Vi(/) and ||/|| Lip ^ are both homogeneous functionals of / (in the sense that 
T(af) = \a\T(f) for a £ R), there is no loss of generality in taking ||/|| Lipu) = 1. Additionally, 
since Vi(f;y) is translation-invariant (in the sense that Vi(f;y) = Vi(f + a\y) for all a £ R), 
there is no loss of generality in restricting the range of / to [0, diam ( z ro ( l S Tl )]. In other words, it 
suffices to consider / £ $ Win . Since essentially this result (for Wi = 1) is proved in [2] in some 
detail, we only give a proof sketch here, highlighting the changes needed for general w. We also 
remark that the extension of this result to countable S is quite straightforward, along the lines 
of Lemma 6.1]. 

Proof. It was shown in Section 5 of |2] that if d w is the unweighted Hamming metric (that is, 
Wi = 1) and / : S n — > R is 1-Lipschitz with respect to d w , then 

n 

W) < ! + E ( 2 °) 
j=i+i 
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This was done by showing that for 1 < i < n and j/e<S', there is a gi : S n — > K (whose explicit 
construction, depending on y and P, is given [2J Eq. (5.2)]), such that for all / : S n — > K, we 
have 

\Vi(f;y)\ < \{ 9i ,f}\. (21) 
It was additionally shown in the course of proving j2J Theorem 5.1] that 

(9i,f) = (Ty9i,T y f), 
where the operator T y : K n — > K n -i+\ is defined by 

(T y h){x) = h(yx), for all x £ S n ~ l+1 . 
Appealing to [2| Theorem 4.8] - the Wi = 1 special case of Theorem 12.51 proved here - we get 

(TyguTyf) < * n (T ygi ). (22) 
It is shown in Theorem 5.1] that the form of gi implies that 

n 

*n{T ygi ) < l+Y^fkj, (23) 
j=i+l 

establishing (|20|l . To generalize (|20|l to ^ 1, we use the fact that if / £ K n is 1-Lipschitz 
with respect to d w , then T y f £ -ftT„_i+i is 1-Lipschitz with respect to d w n. Thus, applying 
Theorem 12. 51 we get 

(T y g t J) < * w n, n - i+1 {T y9i ). (24) 
It follows directly from the definition of ^ wn and the calculation in [21 Theorem 5.1] that 

n 

Hf) < Wi+ wtfij ( 25 ) 

n 

= 22(A n )ijWj = (A n w)i. (26) 

Squaring and summing over i, we obtain l|19|) . □ 

Corollary 3.3. Let S be a finite set and P a measure on S n , for n > 1. For any w £ R+ awd 

/ : S n — > R, we have 



P{\f -Ef\>t} < 2exp 



f 2 



nnl iP , w \\M\l\\*Ji 



where ||A„|| 2 is the £2 operator norm of the matrix defined in \T$) . 

Proof. Since by definition of the £2 operator norm, ||A n w|| 2 < |jA„|| 2 \\w\\ 2 , the claim follows 
immediately via l|16|) and l|19|l . □ 
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